Kenneth Tay
Oct 10, 2019
dplyr
select(): pick variables/columns by their namesmutate(): create new variables/columns based on existing onesarrange(): reorder rowsfilter(): pick rows by their valuessummarize(): collapse many rows down to a single summarygroup_by(): perform operations at a group levelALL of these functions take:
ALL of these functions take:
The dataset is either:
ALL of these functions take:
The dataset is either:
%>%, e.g.ALL of these functions return a dataset!
You can do three things with this returned dataset:
%>%%>% syntax with dplyrTake the mtcars dataset, select just the wt and mpg columns, then select rows with mpg < 15
%>% syntax with dplyr+ syntax with ggplot2tidyr package: gather() and separate()A function is a named block of code which
We’ve already seen a number of functions in R! For example,
## [1] TRUE
The function is.character takes the input given to it in the parentheses and returns TRUE or FALSE, depending on whether the input is of type character or not.
Others we’ve seen: str(), head(), rm(), ggplot(), select(), …
We can see what a function does by typing in ? followed by the function name in the R console.
The most important syntax in R is the function call. All R syntax has function calls underlying it.
A function call consists of:
## [1] NA
## [1] -1
abs(x): If x is positive, return x. If x is negative, return x without the negative sign.
## [1] 2.6
abs(x): If x is positive, return x. If x is negative, return x without the negative sign.
## [1] 2.6
%>%%>% is implemented by the magrittr packagedplyr package is loaded, magrittr is loaded too%>% is “syntactic sugar”: makes code easier to understand%>% becomes the first argument in the function on the right of %>%## [1] 2.6
%>% syntax with dplyrTake the mtcars dataset, select just the wt and mpg columns, then select rows with mpg < 15
+ syntax with ggplot2library(ggplot2)
ggplot(data = mtcars, mapping = aes(x = wt, y = hp)) +
geom_point() +
labs(title = "Horsepower vs. Weight", x = "Weight",
y = "Horsepower") +
theme_classic()+ for ggplot2 only?Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
First answer: Google it! Google “R <function name>”
Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…
First answer: Google it! Google “R <function name>”
A (probably) better answer: Documentation in R itself!
sample(): Descriptionsample(): UsageWhat comes after the = sign: default value for that argument
sample(): Argumentssample(): Detailssample(): Value## [1] 8 7 2 3 1 5 9 10 4 6
## [1] 10 8 9 3 1 5 6 4 2 7
## [1] 9 2 1 10 10 7 10 7 8 3
## [1] 6 2 5 9 4 1 8 7 10 3
## [1] 10 6 4 1 7 3 5 4 6 1
## [1] 3 8 7 6 2
tidyr::gather()E.g. dataset of no. of cases for each country
## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
tidyr::gather()How to make a line plot of no. of cases by year for each country?
## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
Probably want something like
tidyr::gather()How to make a line plot of no. of cases by year for each country?
Problem: Column names are values of the variable year.
## # A tibble: 3 x 3
## country `1999` `2000`
## <chr> <dbl> <dbl>
## 1 Afghanistan 745 2666
## 2 Brazil 37737 80488
## 3 China 212258 213766
tidyr::gather()How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset:
## # A tibble: 6 x 3
## country year cases
## <chr> <chr> <dbl>
## 1 Afghanistan 1999 745
## 2 Brazil 1999 37737
## 3 China 1999 212258
## 4 Afghanistan 2000 2666
## 5 Brazil 2000 80488
## 6 China 2000 213766
tidyr::gather()How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr’s gather()
tidyr::gather()How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr’s gather()
## # A tibble: 6 x 3
## country year cases
## <chr> <chr> <dbl>
## 1 Afghanistan 1999 745
## 2 Brazil 1999 37737
## 3 China 1999 212258
## 4 Afghanistan 2000 2666
## 5 Brazil 2000 80488
## 6 China 2000 213766
tidyr::gather()How to make a line plot of no. of cases by year for each country?
Solution: Reshape dataset using tidyr’s gather()
df %>% gather(`1999`, `2000`, key = "year", value = "cases") %>%
ggplot() +
geom_line(aes(x = as.numeric(year), y = cases, col = country))tidyr::separate()E.g. dataset of rate (cases / population) for each country
## # A tibble: 6 x 3
## country year rate
## <chr> <dbl> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
tidyr::separate()How to get cases and population into columns of their own?
## # A tibble: 6 x 3
## country year rate
## <chr> <dbl> <chr>
## 1 Afghanistan 1999 745/19987071
## 2 Afghanistan 2000 2666/20595360
## 3 Brazil 1999 37737/172006362
## 4 Brazil 2000 80488/174504898
## 5 China 1999 212258/1272915272
## 6 China 2000 213766/1280428583
tidyr::separate()How to get cases and population into columns of their own?
Solution: Use tidyr’s separate()
tidyr::separate()How to get cases and population into columns of their own?
Solution: Use tidyr’s separate()
## # A tibble: 6 x 4
## country year cases population
## <chr> <dbl> <chr> <chr>
## 1 Afghanistan 1999 745 19987071
## 2 Afghanistan 2000 2666 20595360
## 3 Brazil 1999 37737 172006362
## 4 Brazil 2000 80488 174504898
## 5 China 1999 212258 1272915272
## 6 China 2000 213766 1280428583
None to D4: drought levels of increasing severity
Optional material
tidyr functions: gather and spreadgather: Used when some column names are not variables, but values of a variable
spread: Opposite of gather
tidyr functions: separate and uniteseparate: Used to separate values in one column into multiple columns
unite: Opposite of separate